Goto

Collaborating Authors

 Tarpon Springs


WebThinker: Empowering Large Reasoning Models with Deep Research Capability

Li, Xiaoxi, Jin, Jiajie, Dong, Guanting, Qian, Hongjin, Wu, Yongkang, Wen, Ji-Rong, Zhu, Yutao, Dou, Zhicheng

arXiv.org Artificial Intelligence

Large reasoning models (LRMs), such as OpenAI-o1 and DeepSeek-R1, demonstrate impressive long-horizon reasoning capabilities. However, their reliance on static internal knowledge limits their performance on complex, knowledge-intensive tasks and hinders their ability to produce comprehensive research reports requiring synthesis of diverse web information. To address this, we propose WebThinker, a deep research agent that empowers LRMs to autonomously search the web, navigate among web pages, and draft reports during the reasoning process. WebThinker integrates a Deep Web Explorer module, enabling LRMs to dynamically search, navigate, and extract information from the web when encountering knowledge gaps. It also employs an Autonomous Think-Search-and-Draft strategy, allowing the model to seamlessly interleave reasoning, information gathering, and report writing in real time. To further enhance research tool utilization, we introduce an RL-based training strategy via iterative online Direct Preference Optimization (DPO). Extensive experiments on complex reasoning benchmarks (GPQA, GAIA, WebWalkerQA, HLE) and scientific report generation tasks (Glaive) demonstrate that WebThinker significantly outperforms existing methods and strong proprietary systems. Our approach enhances LRM reliability and applicability in complex scenarios, paving the way for more capable and versatile deep research systems. The code is available at https://github.com/RUC-NLPIR/WebThinker.


Agentic Reinforced Policy Optimization

Dong, Guanting, Mao, Hangyu, Ma, Kai, Bao, Licheng, Chen, Yifei, Wang, Zhongyuan, Chen, Zhongxia, Du, Jiazhen, Wang, Huiyang, Zhang, Fuzheng, Zhou, Guorui, Zhu, Yutao, Wen, Ji-Rong, Dou, Zhicheng

arXiv.org Artificial Intelligence

Large-scale reinforcement learning with verifiable rewards (RLVR) has demonstrated its effectiveness in harnessing the potential of large language models (LLMs) for single-turn reasoning tasks. In realistic reasoning scenarios, LLMs can often utilize external tools to assist in task-solving processes. However, current RL algorithms inadequately balance the models' intrinsic long-horizon reasoning capabilities and their proficiency in multi-turn tool interactions. To bridge this gap, we propose Agentic Reinforced Policy Optimization (ARPO), a novel agentic RL algorithm tailored for training multi-turn LLM-based agents. Through preliminary experiments, we observe that LLMs tend to exhibit highly uncertain behavior, characterized by an increase in the entropy distribution of generated tokens, immediately following interactions with external tools. Motivated by this observation, ARPO incorporates an entropy-based adaptive rollout mechanism, dynamically balancing global trajectory sampling and step-level sampling, thereby promoting exploration at steps with high uncertainty after tool usage. By integrating an advantage attribution estimation, ARPO enables LLMs to internalize advantage differences in stepwise tool-use interactions. Our experiments across 13 challenging benchmarks in computational reasoning, knowledge reasoning, and deep search domains demonstrate ARPO's superiority over trajectory-level RL algorithms. Remarkably, ARPO achieves improved performance using only half of the tool-use budget required by existing methods, offering a scalable solution for aligning LLM-based agents with real-time dynamic environments. Our code and datasets are released at https://github.com/dongguanting/ARPO


NLP for The Greek Language: A Longer Survey

Papantoniou, Katerina, Tzitzikas, Yannis

arXiv.org Artificial Intelligence

There is a wide variety of methods, tools and resources for processing text in the English language. However this is not the case for the Greek language even though it has a long documented history spanning at least 3,400 years of written records (including texts in syllabic script), and 28 centuries (Archaic period - new) of written text with alphabet [1, 2]. The over 2500 years literary tradition of Greek is also notable. To aid those that are interested in using, developing or advancing the techniques for Greek processing, in this paper we survey related works and resources organized in categories. We hope this collection and categorization of works to be useful for students and researchers interested in NLP tasks, Information Retrieval and Knowledge Management for the Greek language.


Assessing COVID-19 Impacts on College Students via Automated Processing of Free-form Text

Sharma, Ravi, Pagadala, Sri Divya, Bharti, Pratool, Chellappan, Sriram, Schmidt, Trine, Goyal, Raj

arXiv.org Artificial Intelligence

In this paper, we report experimental results on assessing the impact of COVID-19 on college students by processing free-form texts generated by them. By free-form texts, we mean textual entries posted by college students (enrolled in a four year US college) via an app specifically designed to assess and improve their mental health. Using a dataset comprising of more than 9000 textual entries from 1451 students collected over four months (split between pre and post COVID-19), and established NLP techniques, a) we assess how topics of most interest to student change between pre and post COVID-19, and b) we assess the sentiments that students exhibit in each topic between pre and post COVID-19. Our analysis reveals that topics like Education became noticeably less important to students post COVID-19, while Health became much more trending. We also found that across all topics, negative sentiment among students post COVID-19 was much higher compared to pre-COVID-19. We expect our study to have an impact on policy-makers in higher education across several spectra, including college administrators, teachers, parents, and mental health counselors.


Qualitative structure from motion

Weinshall, Daphna

Neural Information Processing Systems

I have presented a qualitative approach to the problem of recovering object structure from motion information and discussed some of its computational, psychophysical and implementational aspects. The computation of qualitative shape, as represented by the sign of the Gaussian curvature, can be performed by a field of simple operators, in parallel over the entire image. The performance of a qualitative shape detection module, implemented by an artificial neural network, appears to be similar to the performance of human subjects in an identical task.


Qualitative structure from motion

Weinshall, Daphna

Neural Information Processing Systems

I have presented a qualitative approach to the problem of recovering object structure from motion information and discussed some of its computational, psychophysical and implementational aspects. The computation of qualitative shape, as represented by the sign of the Gaussian curvature, can be performed by a field of simple operators, in parallel over the entire image. The performance of a qualitative shape detection module, implemented by an artificial neural network, appears to be similar to the performance of human subjects in an identical task.


Qualitative structure from motion

Weinshall, Daphna

Neural Information Processing Systems

I have presented a qualitative approach to the problem of recovering object structure from motion information and discussed some of its computational, psychophysical and implementational aspects. The computation of qualitative shape, as represented bythe sign of the Gaussian curvature, can be performed by a field of simple operators, in parallel over the entire image. The performance of a qualitative shape detection module, implemented by an artificial neural network, appears to be similar to the performance of human subjects in an identical task.